---
id: crewai
title: CrewAI
sidebar_label: CrewAI
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import { Timeline, TimelineItem } from "@site/src/components/Timeline";
import VideoDisplayer from "@site/src/components/VideoDisplayer";
import ColabButton from "@site/src/components/ColabButton";

# CrewAI

[CrewAI](https://www.crewai.com/) is a lean, independent Python framework designed for creating and orchestrating autonomous multi-agent AI systems, offering high flexibility, speed, and precision control for complex automation tasks.

:::tip
We recommend logging in to [Confident AI](https://app.confident-ai.com) to view your CrewAI evaluation traces.

```bash
deepeval login
```

:::

## End-to-End Evals

`deepeval` allows you to evaluate CrewAI applications end-to-end in **under a minute**.

<Timeline>

<TimelineItem title="Configure CrewAI">

Create a `Crew` and use `instrument_crewai` to instrument your CrewAI application.

```python title="main.py" showLineNumbers
import random

from crewai import Task, Crew, Agent
from crewai.tools import tool

from deepeval.integrations.crewai import instrument_crewai

instrument_crewai()

@tool
def get_weather(city: str) -> str:
    """Fetch weather data for a given city. Returns temperature and conditions."""
    weather_data = {
        "New York": "Partly Cloudy",
        "London": "Rainy",
        "Tokyo": "Sunny",
        "Paris": "Cloudy",
        "Sydney": "Clear",
    }

    condition = weather_data.get(city, "Clear")
    temperature = f"{random.randint(45, 95)}°F"
    humidity = f"{random.randint(30, 90)}%"

    return f"Weather in {city}: {temperature}, {condition}, Humidity: {humidity}"


agent = Agent(
    role="Weather Reporter",
    goal="Provide accurate and helpful weather information to users.",
    backstory="An experienced meteorologist who loves helping people plan their day with accurate weather reports.",
    tools=[get_weather],
    verbose=True,
)

task = Task(
    description="Get the current weather for {city} and provide a helpful summary.",
    expected_output="A clear weather report including temperature, conditions, and humidity.",
    agent=agent,
)

crew = Crew(
    agents=[agent],
    tasks=[task],
)
```

:::info
Evaluations are supported for CrewAI `Agent`. Only metrics with parameters `input`, `output`, `expected_output` and `tools_called` are eligible for evaluation.
:::

</TimelineItem>
<TimelineItem title="Run evaluations">

Create an `EvaluationDataset` and invoke your CrewAI application for each golden within the `evals_iterator()` loop to run end-to-end evaluations. Pass the `metrics` to the `trace` context manager.

<Tabs groupId="crewai">
<TabItem value="synchronous" label="Synchronous">

```python title="main.py" showLineNumbers
from deepeval.tracing import trace
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.dataset import EvaluationDataset, Golden

answer_relavancy_metric = AnswerRelevancyMetric()

dataset = EvaluationDataset(
    goldens=[
        Golden(input="London"),
        Golden(input="Paris"),
    ]
)

for golden in dataset.evals_iterator():
    with trace(trace_metrics=[answer_relavancy_metric]):
        crew.kickoff({"city": golden.input})
```

</TabItem>
  <TabItem value="asynchronous" label="Asynchronous">

```python title="main.py" showLineNumbers
from deepeval.tracing import trace
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.dataset import EvaluationDataset, Golden

answer_relavancy_metric = AnswerRelevancyMetric()

dataset = EvaluationDataset(
    goldens=[
        Golden(input="London"),
        Golden(input="Paris"),
    ]
)

async def run_crewai_e2e_async(input: str):
    with trace(trace_metrics=[answer_relavancy_metric]):
        await crew.kickoff_async({"city": input})

for golden in dataset.evals_iterator():
    task = asyncio.create_task(run_crewai_e2e_async(golden.input))
    dataset.evaluate(task)
```

</TabItem>
</Tabs>

✅ Done. The `evals_iterator` will automatically generate a test run with individual evaluation traces for each golden.

</TimelineItem>
<TimelineItem title="View on Confident AI (optional)">

<VideoDisplayer
  src="https://confident-docs.s3.us-east-1.amazonaws.com/end-to-end%3Acrewai-4k-no-zoom.mp4"
/>

</TimelineItem>

</Timeline>

:::note
If you need to evaluate individual components of your CrewAI application, [set up tracing](/docs/evaluation-llm-tracing) instead.
:::

## Evals in Production

To run online evaluations in production, replace `metrics` with a [metric collection](https://documentation.confident-ai.com/docs/llm-tracing/evaluations#online-evaluations) string from Confident AI, and push your CrewAI agent to production.

```python filename="main.py" showLineNumbers
...
with trace(trace_metric_collection="test_collection_1"):
    result = crew.kickoff(
        "city": "London"
    )
```